Effectiveness of Compiler-Directed Prefetching on Data Mining Benchmarks

نویسندگان

  • Ragavendra Natarajan
  • Vineeth Mekkat
  • Wei-Chung Hsu
  • Antonia Zhai
چکیده

For today's increasingly power-constrained multicore systems, integrating simpler and more energy-e±cient in-order cores becomes attractive. However, since in-order processors lack complex hardware support for tolerating long-latency memory accesses, developing compiler technologies to hide such latencies becomes critical. Compiler-directed prefetching has been demonstrated e®ective on some applications. On the application side, a large class of data centric applications has emerged to explore the underlying properties of the explosively growing data. These applications, in contrast to traditional benchmarks, are characterized by substantial thread-level parallelism, complex and unpredictable control °ow, as well as intensive and irregular memory access patterns. These applications are expected to be the dominating workloads on future microprocessors. Thus, in this paper, we investigated the e®ectiveness of compiler-directed prefetching on data mining applications in in-order multicore systems. Our study reveals that although properly inserted prefetch instructions can often e®ectively reduce memory access latencies for data mining applications, the compiler is not always able to exploit this potential. Compiler-directed prefetching can become ine±cient in the presence of complex control °ow and memory access patterns; and architecture dependent behaviors. The integration of multithreaded execution onto a single die makes it even more di±cult for the compiler to insert prefetch instructions, since optimizations that are e®ective for single-threaded execution may or may not be e®ective in multithreaded execution. Thus, compiler-directed prefetching must be judiciously deployed to avoid creating performance bottlenecks that

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Interaction and Relative Effectiveness of Hardware and Software Data Prefetch

A major performance limiter in modern processors is the long latencies caused by data cache misses. Both compiler and hardware based prefetching schemes help hide these latencies and so improve performance. Compiler techniques infer memory access patterns through code analysis, and insert appropriate prefetch instructions. Hardware prefetching techniques work independently from the compiler by ...

متن کامل

A Compiler-Assisted Data Prefetch Controller

Data-intensive applications often exhibit memory referencing patterns with little data reuse, resulting in poor cache utilization and run-times that can be dominated by memory delays. Data prefetching has been proposed as a means of hiding the memory access latencies of data referencing patterns that defeat caching strategies. Prefetching techniques that either use special cache logic to issue ...

متن کامل

C-Miner: Mining Block Correlations in Storage Systems

Block correlations are common semantic patterns in storage systems. These correlations can be exploited for improving the effectiveness of storage caching, prefetching, data layout and disk scheduling. Unfortunately, information about block correlations is not available at the storage system level. Previous approaches for discovering file correlations in file systems do not scale well enough to...

متن کامل

Compiler Techniques for Software Prefetching on Cache-Coherent Shared-Memory Multiprocessors

This document describes a set of new techniques for improving the eeciency of compiler-directed software prefetching for parallel Fortran programs running on cache-coherent DSM (distributed shared memory) multiprocessors. The key component used in this scheme is a data ow framework that exploits information about array access patterns and about the cache coherence protocol to predict at compile...

متن کامل

Eecient Integration of Compiler-directed Cache Coherence and Data Prefetching Compiler-directed Cache Coherence and Data Prefetching

Cache coherence enforcement and memory latency reduction and hiding are very important and challenging problems in the design of large-scale distributed shared-memory (DSM) multiprocessors. We propose an integrated approach to solve these problems through a compiler-directed cache coherence scheme called the Cache Coherence with Data Prefetching (CCDP) scheme. The CCDP scheme enforces cache coh...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of Circuits, Systems, and Computers

دوره 21  شماره 

صفحات  -

تاریخ انتشار 2012